Goto

Collaborating Authors

 coordinate descent algorithm


Granger Components Analysis: Unsupervised learning of latent temporal dependencies

Neural Information Processing Systems

Here the concept of Granger causality is employed to propose a new criterion for unsupervised learning that is appropriate in the case of temporally-dependent source signals. The basic idea is to identify two projections of a multivariate time series such that the Granger causality among the resulting pair of components is maximized.



TowardsPracticalFew-ShotQuerySets: TransductiveMinimumDescriptionLengthInference

Neural Information Processing Systems

Inparticular,foreach task at testing time, theclasses effectivelypresent intheunlabeled query setareknown a priori, and correspond exactly to the set of classes represented in the labeled supportset.


EfficientClusteringBasedOnAUnifiedViewOf K-meansAndRatio-cut

Neural Information Processing Systems

Inspite ofitsgood (promising) performance, ratio-cut and other traditional spectral clustering methods (SC) suffer from the following drawbacks: (1) The timecomplexityoftraditional spectral clustering isO(n2c),which isoneofsignificant drawbacks of SC. Much effort has been devoted to accelerate the process.


Efficient Solvers for SLOPE in R, Python, Julia, and C++

Larsson, Johan, Bogdan, Malgorzata, Grzesiak, Krystyna, Massias, Mathurin, Wallin, Jonas

arXiv.org Machine Learning

We present a suite of packages in R, Python, Julia, and C++ that efficiently solve the Sorted L-One Penalized Estimation (SLOPE) problem. The packages feature a highly efficient hybrid coordinate descent algorithm that fits generalized linear models (GLMs) and supports a variety of loss functions, including Gaussian, binomial, Poisson, and multinomial logistic regression. Our implementation is designed to be fast, memory-efficient, and flexible. The packages support a variety of data structures (dense, sparse, and out-of-memory matrices) and are designed to efficiently fit the full SLOPE path as well as handle cross-validation of SLOPE models, including the relaxed SLOPE. We present examples of how to use the packages and benchmarks that demonstrate the performance of the packages on both real and simulated data and show that our packages outperform existing implementations of SLOPE in terms of speed.




Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing Systems

First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. Summary: The authors propose a new Newton-like method to optimize the sum of a smooth (convex) cost function and multiple decomposable norms. Their contributions are (1) an active subspace selection procedure that allows to speed up the solution of the quadratic approximation problem (2) a proof that solving the quadratic approximation problem over the (changing) active subspace still leads to convergence. The authors also provide numerical results showing that, for two important problems, their methods gives 10x speed up over state-of-the-art methods and, in the appendix, give numerical results that illustrate which fraction of the speed up is due to the quadratic approximation technique and which fraction of the speed up is due to the active subspace selection method. Quality: The amount of critical information in the appendix makes this paper more suited for a journal than a conference.



An Asymptotically Optimal Coordinate Descent Algorithm for Learning Bayesian Networks from Gaussian Models

Xu, Tong, Küçükyavuz, Simge, Shojaie, Ali, Taeb, Armeen

arXiv.org Machine Learning

This paper studies the problem of learning Bayesian networks from continuous observational data, generated according to a linear Gaussian structural equation model. We consider an $\ell_0$-penalized maximum likelihood estimator for this problem which is known to have favorable statistical properties but is computationally challenging to solve, especially for medium-sized Bayesian networks. We propose a new coordinate descent algorithm to approximate this estimator and prove several remarkable properties of our procedure: the algorithm converges to a coordinate-wise minimum, and despite the non-convexity of the loss function, as the sample size tends to infinity, the objective value of the coordinate descent solution converges to the optimal objective value of the $\ell_0$-penalized maximum likelihood estimator. Finite-sample statistical consistency guarantees are also established. To the best of our knowledge, our proposal is the first coordinate descent procedure endowed with optimality and statistical guarantees in the context of learning Bayesian networks. Numerical experiments on synthetic and real data demonstrate that our coordinate descent method can obtain near-optimal solutions while being scalable.